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The evolution of human language allowed the efficient propagation of nongenetic information, 
thus creating a new form of evolutionary change. Language development in children offers the 
opportunity of exploring the emergence of such complex communication system and provides a 
window to understanding the transition from protolanguage to language. Here we present the first 
analysis of the emergence of syntax in terms of complex networks. A previously unreported, sharp 
transition is shown to occur around two years of age from a (pre-syntactic) tree-like structure 
to a scale-free, small world syntax network. The nature of such transition supports the presence 
of an innate component pervading the emergence of full syntax. This observation is difficult to 
interpret in terms of any simple model of network growth, thus suggesting that some internal, 
perhaps innate component was at work. We explore this problem by using a minimal model 
that is able to capture several statistical traits. Our results provide evidence for adaptive traits, 
but it also indicates that some key features of syntax might actually correspond to non-adaptive 
phenomena. 
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I. INTRODUCTION 

Human language sta nds as one of the greatest tran- 
sition s in evolution (iMavnard-Smith and Szathmaryl 
1997) but its exact origins remain a source of de- 
bate and is consider ed one of the hardes t prob- 
lems in science (Christiansen and Kirbvl 120031 : 
ISzamado and Szathmaryl 2006h . Since language 
does not leave fossils, our windows to its evolution are 
limited and require extr apolation f r om di fferent sources 
of indirect information (jBickertonl . Il99dh . Among the 
relevant questions to be answered is the leading mecha- 
nism driving language emergence: Is language the result 
of natural selection? The use of population models under 
noisy environme nts is consistent with such selection- 
driven scenario (iHurfordl. I 1989L Ik omarova and Nivogil . 



12004 iNowak and Krakauerl Il999f ). 

Other approaches have suggested the impor- 
tance of communicative constraints canalizing the 
possible paths followed by language emergence 
( Ferrer-i-Cancho and Sole! 120031 ). Supporting such 
communication system there has to be a symbolic 
system w hich it h as be en for some authors the core 
question (Deacon], Il997f h Finally, a rather different 
approach focuses on the evolution of the machine 
that generates human language. The most remarkable 
trait of such mac hine is the poss i bility of generating 
infinite structure s ( Chomsky! Il957t lHauser et al. 1. 120021: 
HumboldtJ . fl999h in a recursive fashion. The evolution 
of such ability alone, beyond its potential functionality, 
is considered by s ome authors the m ain problem in 
language evolution (jHauser et al .1. 120021) . 

An alternative approach to this problem considers in- 
stead a non-adaptive view. Roughly, language would be a 
"spand rel" i. e an unselected side-effect of a true adap- 
tation (|Gouldl . 120021 : iGould and Lewontinl . Il979h . The 
term spandrel was borrowed from Architecture and refers 



a CHI: [Telephone go right here] 
CHI: xxx [need it] [my need it] 
CHI: xxx (...) 
CHI: [Put in here] 
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FIG. 1 Building the networks of Syntax Acquisition. First 
we identify the structures in child's produ ctions (a ) usin g the 
lexico-thematic nature of e arly grammars (Radford, 1990), .sec 
(I Corominas- Murt ra . 2007). Afterwards, a basic constituency 
analysis is performed (b) assuming that the semantically most 
relevant item is the head of the phrase and that the verb in 
finite form (if any) is the head of the sentence. Finally (c) a 
projection of the constituent structure in a dependency graph 
is obtained. 



to the space between two arches or between an arch 
and a rectangular enclosure. In the context of evolu- 
tion, a spandrel would be a phenotypic characteristic that 
evolved as a side effect of a true adaptation. More pre- 
cisely, the fe atures of evolutionary sp andrels have been 
summarized ( Sole and Valverde] . 120061 ) as follows (a) they 
are the byproduct (exaptation) of building rules; (b) they 
have intrinsic, well-defined, non-random features and (c) 
their structure reveals some of the underlying rules of 
system's construction. This non- adaptive view has been 
criticized for a number of good reasons (iDennetJ . fl99l 
but remains as an important component of the evolution 
debate. Within the context of language evolution, it has 
been suggested that language would have been a conse- 
quence of a large brain, with neural structur es formerly 
used for other functions ( Hauser et all 120021 ) . 

Since there is no direct trace of primitive commu- 
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nication systems, we are forced to study this problem 
by means of indirect evidence, in the hope that "no 
event happens in the world without leaving traces of 
itself" (|Bickertorj. Il99dh . The remarkable process of lan- 
guage acquisition in children is proba bly the best can- 
didate for such a trace of adaptation ( Bickertonl . Il990t 
iMavnard- Smith and Szathmarvl . Il997t) . Confronted with 
the surprising mastery of complex grammar achieved by 
children over two years, some authors early concluded 
that an innate, hardwired ele ment (a language acquisi- 
tion device) must be at work ( ChomskvlTi 9881 : iPinkerl 
Il994t iPinker and Blooml . Il99dh . Children are able to 
construct complex sentences by properly using phono- 
logical, syntactic and semantic rules in spite that no one 
teaches them. Specifically, they can generate a virtually 
infinite set of grammatically correct sentences in spite 
that they have been exposed to a rather limited num- 
ber of input examples. Moreover, although the lexicon 
shows a monotonous growth as new words are learned, 
the pattern of change in syntactic organization is strongly 
nonlinear, with a well-defined transitions from babbling 
to a fully, complex ad ult grammar th rough the one word 
and two words stage ( Radfordl . Il990h . 

How can children acquire such huge set of rules? Are 
there some specific, basic rules predefined as a part of 
the biological endowment of humans? If so, some mech- 
anism of language acquisition (the universal grammar) 
should guide the process. In this way, models assuming a 
constrained set of accessible grammars have shown that 
final states (i.e., an evolutionary stable complex gram- 
mar) can be reached under a limited exposure to the 
right inputs (jKomarova et all 120011 lNivogiLl2006l ). How- 
ever, we cannot deny the fact that important features of 
the language acquisition process can be obtained by ap- 
pealing only to general purpose mechanisms of le arning 
(|Elmanl . Il993t iMacwhinnevl . l2005t iNewportJ . Il990f) or the 
importance of p ure self-organization in the s tructure of 
the speech code (|Oudeveii]200l ISteelsL Il997l) . An inte- 
grated picture should take into account the interaction of 
some predefined grammar with general purpose mecha- 
nisms of learning and code self-organization, structuring 
human languages as we know today. Under this view, 
transition from protogrammar to grammar would be the 
result of an innovation of b rain organization ra pidly pre- 
dated for communication ( Hauser et all I2002T ) . 

A quantitative analysis of language acquisition data 
is a necessary source of validation of different hypothe- 
ses about language origins and organization. Indeed, it 
is well accepted that any reasonable theory of language 
should be able to explain how it is acquired. Here we 
analyze this problem by using a novel approximation 
to language acquisition based on a global, network pic- 
ture of syntax. Instead of following the changes asso- 
ciated to lexicon size or counting the enumber of pairs 
(or strings) of words, we rather focus on how words re- 
late to each other and how this defines a global graph 
of syntactic links. We focus our analysis in the pres- 
ence of marked transitions in the global organization of 



such graphs. As shown below, both the tempo and mode 
of network change seem consistent with the presence of 
some predefined hardware that is triggered at some point 
of child's cognitive development. Furthermore, we ex- 
plore this conjecture by means of an explicit model of 
language network change that is able to capture many 
(but not all) features of syntax graphs. The agreements 
and disagreements can be interpreted in terms of non- 
adaptive and adaptive ingredients of language organiza- 
tion. 



II. BUILDING SYNTACTIC NETWORKS 



Lang uage acquisitio n involves several well-known 
stages ()Radfordl . 1T990l) . The first stage is the so-called 
babbling, where only single phonemes or short combina- 
tions of them are present. This stage is followed by the 
Lexical spurt, a sudden lexical explosion where the child 
begins to produce a large amount of isolated words. Such 
stage is rapidly replaced by the two words stage, where 
short sentences of two words are produced. In this pe- 
riod, we do not observe the presence of functional items 
nor inflectional morphology. Later, close to the two- years 
age, we can observe the syntactic spurt, where more-than- 
two word sentences are produced. The data set studied 
here includes a time window including all the early, key 
changes in language acquisition, from non-grammatical 
to grammatical stages. 

In this paper we analyse raw data obtained from 
child's utterances, from which we extract a global map 
of the pattern of the use syntactic relations among 
words. In using this view, we look for the dy- 
namics of large-scale organization of the use of syn- 
tax. This can be achieved by means of complex 
networks techniques, by aggregating all syntactic re- 
lationships within a graph. Recent studies have 
shown that networks re veal many interesting features of 



language organization (I Ferrer- i-Cancho and Solel 



Ferrer-i-Cancho et all |2004 iHudsonl. l2006t |K 
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Melcuckl , ll989l:ISigman and Cecchil . l2002l) at different lev 
els. These studies uncovered new regularities in language 
organization but so far none of them analyzed the emer- 
gence of syntax through language acquisition. Here we 
study in detail a set of quantitative, experimental data 
involving child utterances at different times of their de- 
velopment. 

Formally, we define the syntax network Q = {?(W, E) 
as follows (see figHJ). Using the lexicon at any given ac- 
quisition stage, we obtain the collection of words Wi(i = 
1, ...,N W ), being every word a node Wi 6 Q. There is a 
connection between two given words provided that they 
are syntactically linked 1 . The set of links E describes 



Recall that the net is defined as the projection of the constituency 
hierarchy. Thus, the link has not an ontological status under our 
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FIG. 2 Transitions from tree-like graphs to scale-free syntax graphs through the acquisition process. Here three snapshots of 
the process are shown, at (a) 25 months, (b) 26 moths and (c) 28 months. Although a tree-like structure is shown to be present 
through the pre-transition (a-b) a scale- free, much more connected web suddenly appears afterward (c), just two months later. 
The lower pictures indicate how the hubs are organized and their nature. There is a critical change at the two-years age marked 
by a strong reorganization of the network. Prior to the transition, semantically degenerated elements (such as it) act as hubs. 
Key words essential to adult syntax are missing in these early stages. After the transition, the hubs change from semantically 
degenerated to functional items (i.e., a or the). In (f) we highlight the core of this network (the hubs and their links) using 
yellow nodes and edges. 



all the syntactic relationships in the corpus. For every 
acquisition stage, we obtain a syntactic network involv- 
ing all the words and their syntactic relationships. The 
structure of syntax networks will be described by means 
of the adjacency matrix A = [dij] with <Zjj = 1 when there 
is a link between words Wi and uij and aij = otherwise. 

Our corpora are extracted from a recorded ses- 
sion where a child speaks with adults spontaneously. 
We have collected them from the CHILDES Database 
( Macwhinnevl .[2000) 2 . The analysis was performed usin g 
the Dependency Grammar Annotator (jPopescul . 12003th 
Specifically, we choose Peter's corpora as a particu- 
larly r epres entative and complete example ( Bloom et ali , 
Il974l Il975l ). Time intervals are regular and the cor- 
pora spans a time window that can be considered large 
enough to capture statistically relevant properties. Each 
corpus contains several conversations among adult inves- 
tigators and the child. However, the raw corpus must be 
parsed in order to constru ct properly defined graphs. In 
( Cor ominas-Murtr al 12007ft we present a detailed descrip- 



tion of the criteria and rules followed to pre-process the 
raw data. The main features of the parsing algorithm are 
indicated in fig[T]and can be summarized as follows: 

1. Select only child's productions rejecting imitations, 
onomatopoeia's and undefined lexical items. 

2. Identify the structures, i.e., the minimal syntactic 
constructs. 

3. Among the selected structures, we perform a ba- 
sic analysis of constituent structure, identifying the 
verb in finite form (if any) in different phrases. 

4. Project the constituent structures into lexical de- 
pendencie s. This project ion is close to the one pro- 
posed by ([Hudson! . l2006j ) within the framework of 
the network-based Word Grammar 3 . 

5. Finally, we build the graph by following the depen- 
dency relations in the projection of the syntactic 



view of syntax) Corominas- MurtraL [2007h 
2 http://talkbank.org 



3 note that the operation is reversible, since can rebuild the tree 
from the dependency relations 
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A. Global organization 

In agreement with the well-known presence of two dif- 
ferentiated regimes, we found that networks before the 
two-year transition (figl^.-b) show a tree-like organiza- 
tion, suddenly replaced by much larger, heterogeneous 
networks (fig ip;) which are very similar to adult syntac- 
tic networks (|Ferrer-i-Cancho et ali , l2004h . This abrupt 
change indicates a global reorganization marked by a 
shift in grammar structure. This is particularly obvious 
in looking to the changes in the nature of hubs before 
and after the transition. Highly connected words in the 
pre-transition stage are semantically degenerated lexical 
items, such as it. After the transition, hubs emerge as 
functional items, such as a or the. These hubs were es- 
sentially nonexistent in previous stages, as displayed in 

figO 



FIG. 3 Time evolution of word degrees through language ac- 
quisition. Here four relevant words have been chosen: it, a, 
that, the. Their degree has been measured in each corpus and 
display a well-defined change close to the critical age of w 24 
months. Interestingly, it is rapidly replaced by a as the main 
hub as soon as purely functional words emerge. The gray are 
indicates the post-transition (syntactic) domain. 

structures found above. Dependency relations al- 
low us to construct a syntax graph. 

With this procedure, we will obtain a graph for every 
corpus. The resulting graphs will be our object of study 
in the following section. 

III. EVOLVING SYNTAX NETWORKS 

Here we analyze the topological patterns displayed by 
syntax networks at different stages of language acquisi- 
tion. To our knowledge, this is the first detailed anal- 
ysis of language network ontogeny so far. The result- 
ing sequence exhibits several remarkable traits. In fig. 
([2]) we show three examples of these networks. At early 
stages, (fig. dk,b) most words are isolated (not shown 
here) indicating a dominant lack of word-word linkage. 
Isolated words are not shown in these plots. For each 
stage, we study the largest subset of connected words or 
giant component (GC). The reason for considering the 
largest connected component is that, from the very be- 
ginning, the GC is much larger than any other secondary 
connected component and in fact the system shows an al- 
most all-or-none separation between isolated words and 
those belonging to the GC. In other words, the giant 
component captures almost all word- word relations. By 
sampling corpora at different times, we obtain a time se- 
ries of connected networks G(Wt,Et), where Wt and 
Et are the set of words and links derived from the T-th 
corpus, T = 1, 11. 



B. Average degree 

A first quantitative measure is the connectivity of ev- 
ery element. The number of links (or degree ki = k{wi) 
of a given word Wi G W gives a measure of the num- 
ber of different syntactic relations in which such a word 
participates. Figure ([3]) shows the time series of k for sev- 
eral relevant words. All of them display a sharp change 
around two- years (T — 5). The gray area indicates the 
presence of syntactic organization and words such as a, 
the or that strongly increase their presence and take the 
control of the hub structure (compare with the previous 
figure). The advantage of using degree as a measure of 
the relevance of a given word is that this topological trait 
is largely independent on its frequency of appearance. 



C. Small world development 

Two important measures allow us to characterize 
the overall structure of these graphs. These are the 
average path length Lt a nd clustering coefficient Ct 
([Watts and Strogatzl . Il998|) . The first measure is defined 
as the average D T = (D min (i,j)}, where D min (i,j) in- 
dicates the length of the shortest path connecting nodes 
Wi and Wj. The average is performed over all pairs of 
words. Roughly speaking, short path lengths means that 
it is easy to reach any given word Wi starting from an- 
other arbitrary word Wj. Small path lengths in sparse 
networks are often an indication of efficient information 
exchange. The clustering coefficient Ct is defined as the 
probability that two words that are neighbors of a given 
word are also neighbors of each other (i. e. that a tri- 
angle is formed). In order to estimate Ct, we define for 
each word Wi a neighborhood Each word Wj S is 
syntactically related (at least once) with Wi in a produc- 
tion. The words in can also be linked to each other, 
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the number of links (and thus the richness of syntactic 
relations) experiences a sharp change. 

The rapid increase in the number of links indicates 
a qualitative change in network properties strongly tied 
to the reduction of the average path length. A similar 
abrupt transition is observed for the clustering coeffi- 
cient : In the pre-transition stage Ct is small (zero for 
T = 1, 2, 3). After the transition, it experiences a sudden 
jump. Both Dt and Ct are very similar to the measured 
valu es obtained from syntactic gr aphs from written cor- 
pus (|Ferrer-i-Cancho et all 120041) . 



D. Scale-free topology 



FIG. 4 Changes in the structure of syntax networks in chil- 
dren are obtained by means of several quantitative measures 
associated to the presence of small world and scale-free be- 
havior. Here we display: (a) the average path length Dt, 
(b) The number of words (N w ) and links L (c) the clustering 
coefficient. As shown in (a) and (c), a small world pattern 
suddenly emerges after an age of w 24 months. A rapid tran- 
sition from a large L and low C takes place towards a small 
world network (with low D and high C). After the transi- 
tion, well-defined scale-free graphs, with P(k) cx fc -2 ' 30 , are 
observed (d). 



and the clustering C(Ti) is defined as 



c(r, 



ki{ki l) 



(1) 
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2002; 



The average clustering of the Gt network is simply 
Ct = (C(Ti)) i.e, the average over all Wi 6 W. Most 
complex networks in nature and technology are known 
to be small words, meaning that they have short path 
lengths and high clustering ( Watts and Strogat3 . Il998f ) 
Although language net works have been shown t o have 
small world structure dFerrer-i-Cancho and Solej . 
Ferrer-i-Cancho et all 200 41; ISigman" and Cecchil . 
Stevvers and Tenenbauml . 120051 ) little is known about 
how it emerges in developing systems. 

Two regimes in language acquisition can be also ob- 
served in the evolution of the average path length fig. (4a) . 
It grows until reaches a peak at the transition (where the 
small word domain is indicated by means of the grey 
area). Interestingly, at T = 5 the network displays the 
highest number of words for the pre-transition stage. For 
T > 5, the average path length stabilizes Dt ~ 3.5 (see 
fig. (JUb)). The increasing trend of Dt in T < 5 may 
be an indication that combinatorial rules are not able 
to manage the increasing complexity of the lexicon. In 
fig. (4b) we plot the corresponding number of words Nt 
and links Lt of the GC as filled and open circles, respec- 
tively. We can see that the number of connected words 
that belong to the GC increases in a monotonous fash- 
ion, displaying a weak jump at the age of two. However, 



The small world behavior observed at the second phase 
is a consequence of the heterogeneous distribution of links 
in the syntax graph. Specifically, we measure the degree 
distribution P(k), defined as the probability that a node 
has k links. Our syntactic networks display scale-free 
degree distributions P(k) oc fc~ 7 , with 7 w 2.3 — 2.5. 
Scale-free webs are characterized by the presence of a 
few elements (the hubs) having a very large number of 
connections. Such heterogeneity is often the outcome of 
multiplicative processes favo uring already degree-rich el- 
ements to gain further links dBarabasi and Albertl . Il999i : 
iDorogovtsev and Mendesl . l200ll . l2003h . 

An example is shown in fig.((4ji) where the cumulative 
degree distribution, i.e: 



P{k)dk ~ k~^ 



(2) 



is shown. The fitting gives a scaling exponent 7 w 2.3, 
also in agreement with adult studied corpora. They are 
responsible for the very short path lengths and thus for 
the efficient information transfer in complex networks. 
Moreover, relationships between hubs are also interest - 
ing: the syntax graph is dissassortative ( Newmanl . 120021) . 
meani ng that hubs tend to avoid to be connected among 
them ()Ferrer-i-Cancho et al. I, I2004TI . In our networks, 
this tendency also experiences a sharp change close to 
the transition domain (not shown) thus indicating that 
strong constraints emerge strongly limiting the syntactic 
linking between functional words. 



IV. MODELING LANGUAGE ACQUISITION 

We have described a pattern of change in syntax net- 
works. The patterns are nontrivial and quantitative. 
What is their origin? Can we explain them in terms 
of some class of self-organization (SO) model? Are they 
instead associated to some internal, hardwired compo- 
nent? Here we present a new model of network evolution 
that tries to capture the observed changes and provides 
tentative answers to the previous questions. 



6 




FIG. 5 Statistical patterns in language acquisition. In (a) an 
example of the rank-frequency distribution of lexical items is 
shown (here for Peter's corpus (see text) at stage T = 2 (1 
year and 10 months)). The inset (b) displays three examples 
of such skewed distributions in log- log scale for T = 2 (circles), 
T = 5 (squares) and T — 8 (triangles). In (c) the evolution 
of mean length of structure (L) is displayed. It gives an esti- 
mate of the (linear) complexity of the productions generated 
at different stages. The dashed line indicates the two word 
production size. After stage T = 5, the MSL {(s), in the 
text) comes close to two and a sharp change occurs. In (d) 
we also show an example of the frequency distribution N(L) 
for these productions in linear-log form for T = 5. 



A. Simple SO graph growth models 

We explored several types of SO models without suc- 
cess. Appropriate models should be able to generate: 
(a) sharp changes in network connectivity and (b) scale- 
free graphs as the final outcome of the process. In re- 
lation to the sudden shift, it is well known that a sharp 
change in graph connectivity occurs when we add links 
at random between pairs of nodes un til a criti c al ra- 
tio of links against no des is r eached (|Bollobasl 12001]; 
lErdds and Renvi] Il959h . Starting from a set of N iso- 
lated elements, once the number of links L is such that 
p = L/N w 1, we observe a qualitative change in 
graph structure, from a set of small, separated graphs 
(p < 1) to a graph structure displaying a giant com- 
ponent (p > 1) with a comparatively small number of 
isolated subgraphs. This type of percolation model has 
been widely used within the co ntext of SO (Kauffmanl . 
119931; ISole and Goodwin! . l200lh . Unfortunately, such a 
transition is not satisfactory to explain our data, since 
(a) it gives grap h with a Poissonian degree distribution 



(a) it gives gra pn wil 
(lBollobad . l200"lt) . i.e. 



P(k) 



{k) k e~ k 



(3) 



and (b) there is no sharp separation between isolated 
nodes and a single connected graph, but instead many 
subgraphs of different sizes are observed. 

Other models instead consider growing graphs us- 
ing preferential attachment rules (IBarabasi and Albert! . 
119991 iDorogovtsev and Mendel [2001] |2003j) In these 
models the number of nodes grows by adding new ones 
which tends to link with those having the largest con- 
nectivity (a rich-gets-richer mechanism). Under a broad 
range of conditions these amplification mechanisms gen- 



erate scale-free graphs. However, the multiplicative 
process does not lead to any particular type of tran- 
sition phenomenon. The status of hubs remains the 
same (they just win additional links). Actually, well- 
defined predictions can be made, indicating that the de- 
gree of the hubs scal e s with time in a power-law form 
(IBarabasi and Albert! Il999t IDorogovtsev and Mended . 
I2Q0J. 

Although many possible combinations of the previous 
model approaches can be considered, we have found that 
the simultaneous presence of both scale-free structure 
emerging on top of a tree and a phase transition between 
both is not possible. In order to properly represent the 
dynamics of our network, a data-driven approach seems 
necessary. 



B. Network growth model and analysis 

In order to reproduce the observed trends, we have 
developed a new model of network evolution. The 
idea is to describe the process of network growth with- 
out predefined syntactic rules. We make the simplis- 
tic assumption that word interaction only depends on 
word frequency following Zipf's law. In this context, it 
has been suggested that Zipf's law might be the opti- 
mal distribution compatibl e with e fficient communication 
(IFerrer-i-Cancho and Soil 120031; IFerrer-i-Cancho et all 



120051 ; lHarremoes and Topsod . l200lfc ISolel . 120051) . If no 
internal mechanisms arc at work, then our model should 
be able to capture most traits of the evolution of syntax. 

In order to develop the model, a new measure, close 
to the usual M LU 4 used in linguistics, must be defined. 
The structure length of the z-th structured production 
(sj) is measured by counting the number of words that 
participate in the i-th syntactic structure. In our previ- 
ous example (see figure 1) we had 4 structures, of sizes 
|si| = 4, |s 2 | = 2, js3 j = 2 and |s 4 | = 3. Its average, the 
Mean Structure Length, (s) is (s) = 2.75. In fig. (|5]-c) we 
can see how the MSL evolves over time. The frequency 
of s, p(s) was also measured and was found to decay ex- 
ponentially, with p(s) oc e - ! s '/ 7 , with 7 = 1.40 in this 
specific set of data (fig. (]5]-d)) . We can connect the two 
previous through 



-1*1/7 



where Q is defined as the normalization constant: 



-M/7 



(4) 



(5) 



In the five first corpora, (s) < 2. Beyond this stage, it 
rapidly grows with (s) > 2, (see fig. (]5]-b)). 



4 The ML U is the Mean Length of Utterance i.e. the average length 
of a child's utterances, measured in either words or morphemes. 



7 



a b c 




FIG. 6 Sudden changes in network organization from the language acquisition model (see text). In (a) and (b) we display the 
largest subgraph before (c) and right after (b) the transition. The graphs share the basic change from tree-like to scale-free 
structure, although exhibit higher clustering coefficients. In (c) a blow-up of (b) is shown, indicating the presence of a few hubs 
that are connected among them both directly and through secondary connectors. 



We incorpore to the data-driven model our knowledge 
on structure lengths. We first construct, for each cor- 
pus, a random syntactic network that shares the statis- 
tics of word frequencies and structure lengths of the cor- 
responding data set. Such a measure can be interpreted, 
in cognitive terms, as some kind of working memory and 
might be the footprint of som e maturational constraints 
(|Elmanl . [l993t iNewportl . Il990h . For simplicity, we assume 
that the probability of the i-th most frequent word is a 
scaling law: 



Pw(i) 



-0 



Z 



(G) 



with 1 < i < N W (T), (3 
constant: 

Z = 



1 and Z is the normalization 



N m (T) 



til 



(7) 



(notice that Z depends on lexicon size, N W (T), which 
grows slowly at this stage). However, the actual word fre- 
quency is affected by other corpus features. In particular, 
our corpora are highly redundant with many duplicated 
structures but we build our nets ignoring such redundan- 
cies, since we are interested in the topological patterns of 
use. For every corpus T with N S (T) distinct structures, 
we compute the distribution of structure lengths pr(s), 
1 < T < 11. From N W (T), p w {i), N S (T) and p T (s), 
we generate a random syntactic network for every stage 
1 < T < 11 (see fig.©). 'Given a lexicon with N W (T) dif- 
ferent items, labeled as ai...a Nm ^ the model algorithm 
goes as follows: 

1. Generate a random positive integer s with proba- 
bility pt(s). 

2. Choose s different "words" from the lexicon, 
a\, ...,cij each word with probability p(a,i) ex i - ' 3 , 
with (3 » 1. 

3. Trace an arc between every two successive words 
thus generating a unstructured string of s nodes. 



4. Repeat (1), (2) and (3) until N S (T) structures are 
generated. 

5. Aggregate all the obtained strings in a single, global 
graph. 

In spite of the small number of assumptions made, the 
above model reproduces many of the topological traits 
observed in real networks. To begin with, we clearly ob- 
serve the sudden transition from tree-like networks to 
scale- free networks (see figlH])- Furthermore, typical net- 
work properties, such as clustering, degree distribution 
or path lenghts seem to fit real data successfully (see 
fig. ©). The very good agreement between global pat- 
terns of network topology is remarkable given the lack of 
true syntax. It indicates that some essential properties 
of syntax networks come "for free" . In other words, both 
the small world and the scale-free architecture of syn- 
tax graphs would be spandrels: although these type of 
networks provide important advantages (such as highly 
efficient and robust network interactions) they would be 
a byproduct of Zipf 's law and increased neural complex- 
ity. These results thus support the non-adaptive nature 
of language evolution. 

However, particularly beyond the transition, a detailed 
analysis is able to find important deviations between data 
and model predictions. This becomes specially clear by 
looking at small subgraphs of connected words. Studying 
small size subgraphs allows to explore local correlations 
among units. Such correlations are likely to be closer 
to the underlying rules of network construction, since 
they are limited specificaly to direct node-node relations 
and their frequency. We have found that the subgraph 
census reveals strong deviations from the model due to 
the presence of grammatical constraints, i.e, non-trivial 
rules to build the strings. 

In figure © we display the so-calle d sub- 
graph census plot dHolland and Leinhardtl . Il97fi 
IWasserman and Fausti Il994l ) for both real (circles) and 
simulated (squares) networks. Here the frequencies of 
observed subgraphs of size three are shown ordered in 
decreasing order for the real case. For the simulated 
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FIG. 7 Algorithm for network growth. The model uses as input information a Zipf 's distribution of "words" and the probability 
to find a structure of size s in a given corpus, Pt{s). Each step we choose s words from the list, each word with a probability 
proportional to their frequency. A link is then established between two successive words generating an unstructured string of 
s nodes. We repeat the process a number of times and we aggregate in a global graph all the obtained strings. pr(s) can be 
interpreted as the footprint of a kind of working memory, and follows an exponential distribution (As shown in fig. (|5jl) 



networks, we have averaged the subgraph frequencies 
over 50 replicas. Several obvious differences are observed 
between both censuses. The deviations are mainly 
due to the hierarchical relations that display a typical 
syntactic structure, and to the fact that lexical items 
tend to play the same specific role in different structures 
(see figlHb-d). Specifically, we find that the asymetries 
in syntactic relations induce the overabundance of 
certain subgraphs and constrain the presence of others. 
Specially relevant is the low value of third type of 
subgraph, confronted with the model prediction. This 
deviation can be due to the organizing role of functional 
words (mainly out-degree hubs) in grammar. Indeed, 
coherently with this interpretation, we find that the first 
type of subgraph (related with out-degree hubs) is more 
abundant than the model prediction. 




Age (months) 



FIG. 8 Changes in the structure of syntax model networks 
-compare with fig.(f3|. Here we show: (a) the average path 
length L, (b) the number of links (L) and lexical items (N) 
and (c) the clustering coefficient C. An example of the re- 
sulting SF distributions is also shown in (d). 



The second interesting deviation is given by the 
changing status of hubs. As previously described, in 
the prefunctional period hubs are semantically degen- 
erated words, such as that, it, whereas beyond the 
transition hubs are functional words. This observation 
seems to be coherent with a recently proposal to under- 
stand the emergence of functional items in child gram- 
mars. In short, a pure articulattory strategy intro- 
duces a new sound (mainly the a) that is rapidly pre- 
dated by the syntactic system when it is mature enough 
( Veneziano and Sinclaiiil2000f ) . This would imply a reuse 
of an existing, phonetical clement and would explain the 
astonishing increasing of appearance that they experi- 
ence. If we follow the changes in number of links dis- 
played by the hubs in the simulated system, no such ex- 
change is ever observed. Instead, their degree simply 
keeps growing through the process (not shown). 



V. DISCUSSION 

Our study reveals two clearly differentiated behav- 
iors in the early stages of language acquisition. Rules 
governing both grammatical and global behavior seem 
to be qualitatively and quantitatively different. Could 
we explain the transition in terms of self-organizing or 
purely external-driven mechanism? Clearly not, given 
the special features exhibited by our evolving webs, 
not shared by any curren t mode l of ev olving networks 
(|Dorogovtsev and Mendel l200ll 120031 ). Beyond the 
transition, some features diverge dramatically from the 
pre transition graph, particularly the changing role of 
the hubs. Such features cannot be explained from exter- 
nal factors (such as communication constraints among 
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FIG. 9 Subgraph census plot for both real (circles) and simu- 
lated (squares) networks. As we can see in (a), there exist an 
overabundance of the first two subgraphs due to grammatical 
restrictions on the role of the syntactic head (see text), (b) 
and (c) are an example of the kind of nodes that participate 
in such small subgraphs. Beyond this two subgraphs, we find 
a sharp decay in its abundance against, compared with the 
model. This is due to the fact that the third studied motif 
(d) should be abundant (as in the model). 



individuals). Instead, it seems tied to changes in the in- 
ternal machinery of grammar. The sharp transition from 
small tree-like graphs to much larger scale-free nets, and 
the sudden change of the nature of hubs are the foot- 
prints of the emergence of new, powerful rules of explo- 
ration of the combinatorial space, i.e., the emergence of 
full adult syntax. This see ms to support the h ypotheses 
suggested by Hauser et al. flHauser et a/.l . [2003 ); see also 
(|Nowak and Krakaueii I1999D . 

Furthermore, we have presented a novel approach 
to language acquisition based on a simple, data-driven 
model. Previous model approaches based on self- 
organization cannot reproduce the observed patterns of 
change displayed by syntax graphs. Our main goal was 
to explore the potential roles of adaptive versus non- 
adaptive components in shaping syntax networks as they 
change in time. The model is able to reproduce some fun- 
damental traits. Specifically we find that: (a) the global 
architecture of syntactic nets obtained during the acqui- 
sition process can be reproduced by using a combination 
of Zipf's law and assuming a growing working memory 
and (b) strong deviations are observed when looking at 
the behavior of hubs and the distribution of subgraph 
abundances. Such disagreements cannot be fixed by ad- 
ditional rules. Instead, they indicate the presence of some 
innate, hard-wired component related with the combina- 
torial power of the underlying grammatical rules that is 
triggered at some point of the child's cognitive devel- 
opment. Our study supports the view that the topo- 
logical organization of syntactic networks is a spandrel, 
a byproduct of communication and neural constraints. 
But the marked differences found here cannot be re- 
duced to such scenario and need to be of adaptive na- 
ture. Furthermore, our analysis provides a quantitative 
argument to go forward beyond statistics in the search 
of fundamental rules of synt ax, as it was early argued in 
dMiller and Chomskvi . fl963l) ■ 

A further line of research should extend the analysis 



to other (typologically different) languages and clarify 
the nature of the innovation. Preliminary work using 
three different european languages supports our previ- 
ous results (Corominas-Murtra et al unpublished work). 
Moreover, modeling the transitions from finite grammars 
to unbo unded ones by means o f connectionist approxi- 
mations (|Szathmarv et al\ . l2007t) could shed light on the 
neuronal prerequisites canalizing the acquisition process 
towards a fully developed grammar as described and mea- 
sured by our network approach. 
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